Add support for JavaScript by gkorland · Pull Request #59 · FalkorDB/code-graph-backend

gkorland · 2025-01-03T12:40:48Z

Fixes #51

Add support for JavaScript code analysis using tree-sitter.

Add api/analyzers/javascript/analyzer.py implementing JavaScriptAnalyzer class using tree-sitter for JavaScript.
- Implement methods for first and second pass analysis.
- Use tree-sitter to parse JavaScript code.
- Extract functions and classes from JavaScript code.
- Connect entities in the graph.
Update api/analyzers/source_analyzer.py to include JavaScriptAnalyzer in the analyzers list.
Add tree-sitter-javascript dependency to pyproject.toml.
Add utility functions for JavaScript analysis in api/analyzers/utils.py.

For more details, open the Copilot Workspace session.

Summary by CodeRabbit

New Features
- Added JavaScript support: the analysis platform now detects and analyzes .js files, extracting functions, classes, methods, docstrings/comments, and call/type relationships. JavaScript projects are integrated into the multi-language pipeline, enabling cross-language entity linking and richer code insights.

Fixes #51 Add support for JavaScript code analysis using tree-sitter. * Add `api/analyzers/javascript/analyzer.py` implementing `JavaScriptAnalyzer` class using tree-sitter for JavaScript. - Implement methods for first and second pass analysis. - Use tree-sitter to parse JavaScript code. - Extract functions and classes from JavaScript code. - Connect entities in the graph. * Update `api/analyzers/source_analyzer.py` to include `JavaScriptAnalyzer` in the analyzers list. * Add `tree-sitter-javascript` dependency to `pyproject.toml`. * Add utility functions for JavaScript analysis in `api/analyzers/utils.py`. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/FalkorDB/code-graph-backend/issues/51?shareId=XXXX-XXXX-XXXX-XXXX).

vercel · 2025-01-03T12:40:51Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
code-graph-backend	Ready	Preview, Comment	Feb 21, 2026 7:30am

coderabbitai · 2025-01-03T12:40:55Z

No actionable comments were generated in the recent review. 🎉

📝 Walkthrough

Walkthrough

Adds JavaScript support: new JavaScriptAnalyzer using tree-sitter for parsing and entity extraction, integrates .js files and a JS language server into the source analysis pipeline, and adds tree-sitter-javascript to project dependencies.

Changes

Cohort / File(s)	Summary
JavaScript Analyzer Implementation `api/analyzers/javascript/analyzer.py`	New `JavaScriptAnalyzer` class implementing parsing, entity discovery (functions, classes, methods), docstring/type extraction, two-pass processing (first_pass/second_pass), and symbol/type/method resolution via tree-sitter queries and LSP-assisted resolution.
Analyzer Integration `api/analyzers/source_analyzer.py`	Imported `JavaScriptAnalyzer`, registered `'.js': JavaScriptAnalyzer()` in `analyzers`, added `*.js` to file discovery globs, and initialized a JS language server entry for second-pass processing.
Dependency Management `pyproject.toml`	Added dependency `tree-sitter-javascript>=0.25.0,<0.26.0` to project dependencies.

Sequence Diagram(s)

sequenceDiagram
    participant FS as FileSystem (*.js)
    participant Analyzer as JavaScriptAnalyzer
    participant LSP as JS Language Server
    participant Graph as CodeGraph

    rect rgba(100,149,237,0.5)
      FS->>Analyzer: provide file contents (first_pass)
      Analyzer->>Analyzer: parse with tree-sitter, create Entities
      Analyzer->>Graph: add Entities (declarations)
    end

    rect rgba(60,179,113,0.5)
      Analyzer->>LSP: request symbol/type info (second_pass)
      LSP-->>Analyzer: return resolution data
      Analyzer->>Graph: add edges (calls, inherits, references)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A little rabbit hops through code so neat, 🐇
Parsing tokens with tree-sitter feet,
First pass finds names, second pass ties,
Entities bloom before my eyes,
Joyful hops — the graph's complete!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add support for JavaScript' clearly and concisely summarizes the main change: introducing JavaScript analyzer support to the codebase.
Linked Issues check	✅ Passed	The PR fully implements the requirements from issue `#51` by adding JavaScript support via tree-sitter, implementing JavaScriptAnalyzer, extracting code entities, and integrating them into the existing code-graph.
Out of Scope Changes check	✅ Passed	All changes are directly related to adding JavaScript support: the new analyzer implementation, integration into source_analyzer, and the tree-sitter-javascript dependency are all within scope.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch gkorland/add-javascript-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (6)

api/analyzers/javascript/analyzer.py (3)
22-71: Enhance function declaration processing and handle arrow functions.

Currently, process_function_declaration only captures standard function declarations. If you'd like to capture arrow functions or function expressions, consider expanding the query or logic. Also, note that using find_child_of_type(node, 'identifier') might skip function declarations without a named identifier (like anonymous functions).

🧰 Tools

🪛 Ruff (0.8.2)

22-22: Function may be undefined, or defined from star imports

(F405)

35-35: find_child_of_type may be undefined, or defined from star imports

(F405)

45-45: find_child_of_type may be undefined, or defined from star imports

(F405)

62-62: Function may be undefined, or defined from star imports

(F405)

72-102: Extend class declaration handling for inheritance.

This method correctly extracts the class name from the identifier child. You may want to handle extends clauses (e.g., class Foo extends Bar) or keep track of implemented interfaces in the future.

🧰 Tools

🪛 Ruff (0.8.2)

72-72: Class may be undefined, or defined from star imports

(F405)

85-85: find_child_of_type may be undefined, or defined from star imports

(F405)

99-99: Class may be undefined, or defined from star imports

(F405)

157-220: Protect against missing function entities and arrow function calls.

The second pass currently assumes function declarations are always standard. Arrow functions won't be captured, so the calls might remain unresolved.

assert(caller_f is not None) may crash if the function is somehow not recognized. Consider a safer check, logging a warning, or creating a placeholder entity to avoid halting the entire analysis.
-assert(caller_f is not None)
+if caller_f is None:
+    logger.warning(f"Caller function '{caller_name}' not found. Skipping relationship.")
+    continue
🧰 Tools

🪛 Ruff (0.8.2)

216-216: Function may be undefined, or defined from star imports

(F405)
api/analyzers/utils.py (2)

25-38: Augment arrow function or unnamed function handling.

extract_js_function_name assumes there's an identifier child. Consider fallback logic for arrow or anonymous functions if needed (e.g., generating a placeholder name).

40-53: Handle anonymous or default-export classes.

Similar to functions, classes can sometimes be declared without a direct identifier (export default class, etc.). Consider a fallback name or a distinct approach for these cases.

api/analyzers/source_analyzer.py (1)

20-21: Consider broader JavaScript-related extensions.

Mapping .js to JavaScriptAnalyzer is a good start. If you plan to handle .mjs, .cjs, or .jsx, consider extending the analyzers dictionary or making it more flexible.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8ec9a4 and 754f412.

📒 Files selected for processing (4)

api/analyzers/javascript/analyzer.py (1 hunks)
api/analyzers/source_analyzer.py (2 hunks)
api/analyzers/utils.py (1 hunks)
pyproject.toml (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/analyzers/javascript/analyzer.py

3-3: from ..utils import * used; unable to detect undefined names

(F403)

5-5: from ...entities import * used; unable to detect undefined names

(F403)

22-22: Function may be undefined, or defined from star imports

(F405)

35-35: find_child_of_type may be undefined, or defined from star imports

(F405)

45-45: find_child_of_type may be undefined, or defined from star imports

(F405)

62-62: Function may be undefined, or defined from star imports

(F405)

72-72: Class may be undefined, or defined from star imports

(F405)

85-85: find_child_of_type may be undefined, or defined from star imports

(F405)

99-99: Class may be undefined, or defined from star imports

(F405)

123-123: File may be undefined, or defined from star imports

(F405)

216-216: Function may be undefined, or defined from star imports

(F405)

🔇 Additional comments (5)

api/analyzers/javascript/analyzer.py (3)

18-21: Analyzer constructor looks good.

The constructor properly initializes the Parser with JS_LANGUAGE. This is a succinct and clear approach.

103-156: Consider robust UTF-8 handling.

The code tries to decode the file content using UTF-8, which might fail if the file is in another encoding. This fallback is fine in many cases, but you might want to log or handle suspected encoding issues more explicitly if supporting diverse codebases.

🧰 Tools

🪛 Ruff (0.8.2)

123-123: File may be undefined, or defined from star imports

(F405)

10-13: Confirm Tree-sitter JavaScript library availability.

The import statements import tree_sitter_javascript as tsjs and Language, Parser from tree_sitter look correct for parsing JavaScript. Just verify that the pyproject.toml changes (adding tree-sitter-javascript) are indeed installed in your environment.

✅ Verification successful

tree-sitter-javascript dependency is properly configured

The pyproject.toml file shows that tree-sitter-javascript version ^0.23.2 is correctly listed as a dependency in the project's Poetry configuration. This confirms that the import statement import tree_sitter_javascript as tsjs will work as expected when the dependencies are installed.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Ensure tree-sitter-javascript is installed in the current environment
pip freeze | grep "tree-sitter-javascript"
Length of output: 136

Script:
#!/bin/bash
# Check if pyproject.toml exists and contains tree-sitter-javascript dependency
fd pyproject.toml --exec cat {}
Length of output: 877

Script:
#!/bin/bash
# Check if requirements.txt exists and contains tree-sitter-javascript dependency
fd requirements.txt --exec cat {}
Length of output: 65574

Script:
#!/bin/bash
# Check if setup.py exists and contains tree-sitter-javascript dependency
fd setup.py --exec cat {}
Length of output: 29

Script:
#!/bin/bash
# Look for any other Python dependency files
fd ".*requirements.*\.txt|poetry\.lock|Pipfile" --exec cat {}
Length of output: 65602
api/analyzers/source_analyzer.py (1)

11-11: JavaScript analyzer import addition confirmed.

Adding JavaScriptAnalyzer aligns with the pull request objective to support .js files.
pyproject.toml (1)

16-16: Dependency addition looks appropriate.

tree-sitter-javascript = "^0.23.2" is correctly specified. Validate that this version meets any advanced language feature requirements you may want to parse (e.g., newer JS syntax).

✅ Verification successful

Version 0.23.2 is appropriate for the JavaScript parsing requirements

Based on the code analysis, the tree-sitter-javascript dependency is used for basic JavaScript AST parsing, focusing on:

Function declarations and their parameters

Class declarations

Function call expressions

Basic identifier extraction

The specified version (^0.23.2) is actually newer than the latest PyPI version (0.23.1) and supports all the JavaScript syntax features currently being used in the analyzer. The codebase doesn't use any advanced JavaScript features that would require a newer version, as it's primarily focused on fundamental syntax elements like functions, classes, and method calls.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check if the newly added dependency is at the latest stable version
curl -s https://pypi.org/pypi/tree-sitter-javascript/json | jq '.info.version'
Length of output: 87

Script:
#!/bin/bash
# Check if there are any JavaScript files in the repository to understand the JS features being used
fd -e js -x head -n 50
Length of output: 24

Script:
#!/bin/bash
# Look for any configuration files that might specify JavaScript version/features
fd -g '*babel*' -g '*eslint*' -g '*tsconfig*' -g '*jsconfig*'
Length of output: 273

Script:
#!/bin/bash
# Check if tree-sitter-javascript is already in use and how
rg -l "tree-sitter-javascript"
Length of output: 57

Script:
#!/bin/bash
# Check the content of poetry.lock to see how tree-sitter-javascript is being used
cat poetry.lock | grep -A 10 "tree-sitter-javascript"
Length of output: 678

Script:
#!/bin/bash
# Look for Python files that might be using tree-sitter-javascript
rg -l "from tree_sitter" 
Length of output: 147

Script:
#!/bin/bash
# Check how tree-sitter-javascript is being used in the JavaScript analyzer
cat api/analyzers/javascript/analyzer.py
Length of output: 7853

Script:
#!/bin/bash
# Check utils.py to understand the common tree-sitter usage patterns
cat api/analyzers/utils.py
Length of output: 1429

api/analyzers/javascript/analyzer.py

swilly22 · 2025-01-04T17:16:11Z

api/analyzers/utils.py

+def extract_js_function_name(node: Node) -> str:
+    """
+    Extract the function name from a JavaScript function node.
+
+    Args:
+        node (Node): The AST node representing the function.
+
+    Returns:
+        str: The name of the function.
+    """
+    for child in node.children:
+        if child.type == 'identifier':
+            return child.text.decode('utf-8')
+    return ''
+
+def extract_js_class_name(node: Node) -> str:
+    """
+    Extract the class name from a JavaScript class node.
+
+    Args:
+        node (Node): The AST node representing the class.
+
+    Returns:
+        str: The name of the class.
+    """
+    for child in node.children:
+        if child.type == 'identifier':
+            return child.text.decode('utf-8')
+    return ''


Language specific utilizes shouldn't be added to utils

swilly22 · 2025-01-04T17:17:43Z

api/analyzers/javascript/analyzer.py

+        """
+        Perform the first pass processing of a JavaScript source file.
+
+        Args:
+            path (Path): The path to the JavaScript source file.
+            f (io.TextIOWrapper): The file object representing the opened JavaScript source file.
+            graph (Graph): The Graph object where entities will be added.
+
+        Returns:
+            None
+        """


Comment should include information about what the function does, "processing JavaScript file" is too general.
specify which entities are extracted.

swilly22 · 2025-01-04T17:19:26Z

api/analyzers/javascript/analyzer.py

+        try:
+            # Parse file
+            content = f.read()
+            tree = self.parser.parse(content)
+        except Exception as e:
+            logger.error(f"Failed to process file {path}: {e}")
+            return


I think this is a bit of a waste, we've already read the file and parsed it on the first pass.

Copilot

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Copilot · 2025-01-09T13:57:50Z

api/analyzers/javascript/analyzer.py

+            caller = function_def[0]
+            caller_name = caller.text.decode('utf-8')
+            caller_f = graph.get_function_by_name(caller_name)
+            assert(caller_f is not None)


Using assert in production code is unconventional. Consider handling this case more gracefully.

Suggested change

assert(caller_f is not None)

if caller_f is None: logger.error(f'Caller function not found: {caller_name}'); continue

Resolve merge conflicts with main branch which restructured the analyzer pattern. Rewrite JavaScriptAnalyzer to follow the new AbstractAnalyzer interface with get_entity_label, get_entity_name, get_entity_docstring, get_entity_types, add_symbols, resolve_symbol, and other required abstract methods. Address PR review comments: - Replace wildcard imports with explicit imports (CodeRabbit) - Move JS-specific utils out of shared utils.py (swilly22) - Remove redundant file re-parsing in second pass (swilly22) - Replace assert with proper error handling pattern (copilot-reviewer) - Improve docstrings to be more specific (swilly22) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

api/analyzers/source_analyzer.py

    def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
        path = path.resolve()
-        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs"))
+        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + list(path.rglob("*.js"))


General approach: ensure that any user‑supplied path is validated and constrained before it is used with filesystem APIs (Path.resolve, Path.rglob, Repository(path)). A common pattern is to define a safe root directory (from configuration or an environment variable), resolve both the root and the requested path, and then verify that the requested path is inside the root (using .resolve() and a prefix / ancestor check). If the check fails, reject the request.

Best fix with minimal behavior change:

In api/analyzers/source_analyzer.py, add a small helper method on SourceAnalyzer to validate and normalize incoming paths:

Accept a str or Path.

Resolve it to an absolute path.

Optionally enforce that it is under a configured root directory, if such an environment variable exists (we’ll look it up inside the helper to avoid new imports here).

Ensure it is a directory for analyze_local_folder, and exists for analyze_local_repository.

Raise a clear ValueError (or RuntimeError) on violation.

Use this helper in:

analyze_local_folder: instead of passing Path(path) directly, call the validator, then use the returned Path object for analyze_sources.

analyze_local_repository: use the same validator to get a normalized, allowed repo path, then pass that to both analyze_local_folder and Repository(...).

The endpoint in tests/index.py already checks os.path.isdir(path), but that’s only used for tests. With the new validation in SourceAnalyzer, any other caller (such as api/index.py routes that eventually call analyze_local_folder / analyze_local_repository) also gets the protection.

We can implement the helper purely inside SourceAnalyzer using Path.resolve and Path.is_relative_to (Python 3.9+) or a try: relative_to fallback. No new third‑party dependencies are needed; we’ll only add an import os in api/analyzers/source_analyzer.py if we choose to read an environment variable for the allowed root.

Concretely:

Add a private method _normalize_and_validate_path(self, path_str: str, must_be_dir: bool = True) -> Path above analyze_local_folder.

In analyze_local_folder, call this helper and pass the returned Path to analyze_sources instead of constructing Path(path) directly.

In analyze_local_repository, call the same helper with must_be_dir=True, then use the resolved Path both for analyze_local_folder and Repository(str(resolved_path)).

This keeps existing functionality (scanning arbitrary directories) but ensures paths are absolute, normalized, and (optionally) within a configured safe root; if a root is not configured, we still normalize and ensure the path is a directory before traversing.

api/analyzers/source_analyzer.py

    def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
        path = path.resolve()
-        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs"))
+        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + list(path.rglob("*.js"))


General approach: constrain and validate user-provided paths before using them in filesystem operations. At minimum, decide on a safe root directory under which all analysis must occur, normalize the requested path, and ensure the normalized path is contained within that root. This also gives CodeQL a clear, recognizable mitigation pattern (normalize then prefix-check).

Best fix in this codebase without changing existing functionality more than necessary:

In SourceAnalyzer.analyze_local_folder, convert the string path into a normalized Path object, reject non-absolute or non-directory paths, and (crucially) enforce that the path lies within a configurable root directory. Use Path.resolve() and .relative_to() to ensure containment.

Pass the validated Path object into analyze_sources instead of constructing a new Path from the raw string.

Centralize the definition of the allowed root directory in SourceAnalyzer (e.g., an environment-variable-controlled root or default to the current working directory), so that we do not change external APIs but still restrict analysis to a subtree.

Keep all other behavior (files discovered via rglob, graph creation, etc.) unchanged.

Concretely:

In api/analyzers/source_analyzer.py:

Add an attribute (e.g., self.root_dir) in SourceAnalyzer.__init__ to define the root directory from an environment variable like CODE_GRAPH_ROOT_DIR or default to the process working directory (Path.cwd()), and resolve it.

In analyze_local_folder:

Resolve the incoming path to requested_path = Path(path).resolve().

Ensure requested_path is a directory (requested_path.is_dir()).

Ensure requested_path is inside self.root_dir using requested_path.relative_to(self.root_dir) in a try block; if it raises ValueError, log and raise an exception (or just log and return).

Call self.analyze_sources(requested_path, ignore, g) rather than recreating Path(path) inside.

In analyze_sources, keep the existing path = path.resolve() and rglob usage; now the input has already been constrained to lie under a safe root, satisfying CodeQL’s recommendation while preserving the method’s behavior for internal callers.

This fix addresses all variants of the alert because every path originating from HTTP (tests/index.py or api/index.py) flows through SourceAnalyzer.analyze_local_folder and then into analyze_sources, which will now only operate within the intended root directory.

api/analyzers/source_analyzer.py

    def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
        path = path.resolve()
-        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs"))
+        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + list(path.rglob("*.js"))


In general, to fix this kind of issue you must not let arbitrary user input select arbitrary filesystem roots. Instead, restrict paths to a safe base directory (or a fixed allow-list of roots) and/or treat the user-provided value only as a name within a controlled directory. This is done by (1) defining a safe root directory (for repositories or local folders), (2) constructing a candidate path by joining the root and the user input, (3) normalizing/resolving that path, and (4) verifying that the resolved path is still within the allowed root. If the check fails, return an error.

For this codebase, the best fix with minimal behavior change is:

Introduce a single helper in SourceAnalyzer that:

Accepts the untrusted path string and a base directory Path.

Constructs resolved = (base_dir / path).resolve().

Ensures resolved.is_dir() and that resolved is inside base_dir via resolved.is_relative_to(base_dir) (Python 3.9+) or a try: resolved.relative_to(base_dir) fallback.

Use this helper in analyze_local_folder before calling analyze_sources. That way, every caller that passes a string path (including both tests/index.py and api/index.py flows) will be constrained to a configured base directory such as the current working directory or a specific environment-configurable root.

Use the returned safe Path to call analyze_sources, so path.rglob(...) in analyze_sources always operates under the safe root.

Concretely:

In api/analyzers/source_analyzer.py:

Add import os (standard library) since we’ll read an optional env var for the base root.

Add a private method _resolve_and_validate_path(self, path: str) -> Path inside SourceAnalyzer before analyze_local_folder.

Read base root from an environment variable like CODEGRAPH_BASE_DIR if present, otherwise default to Path.cwd().

Resolve base_root = base_root.resolve().

Build candidate = (base_root / path).resolve().

Check candidate.is_dir() and that it is inside base_root. If not, raise ValueError.

Update analyze_local_folder to call this helper:

Replace self.analyze_sources(Path(path), ignore, g) with safe_path = self._resolve_and_validate_path(path) and then self.analyze_sources(safe_path, ignore, g).

This keeps the public API of SourceAnalyzer unchanged while ensuring that all filesystem walks start from a safe, controlled base directory and no longer directly trust arbitrary absolute/relative paths from HTTP requests.

api/analyzers/source_analyzer.py

    def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
        path = path.resolve()
-        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs"))
+        files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + list(path.rglob("*.js"))


General approach: constrain user-controlled paths to a safe root, and normalize them before use. The analyzer should only traverse directories inside a configured “workspace root” (for example, an environment variable like CODE_GRAPH_WORKSPACE_ROOT or the current working directory), and should reject inputs that escape that root. Normalization (via Path.resolve() / os.path.realpath) must be done before checking containment.

Best concrete fix with minimal behavior change:

Add a helper in SourceAnalyzer that takes an input path string, resolves it to an absolute Path, and enforces that it lies under an allowed root directory.

Determine the allowed root using an environment variable (e.g., CODE_GRAPH_WORKSPACE_ROOT) if present, otherwise default to the current working directory (Path.cwd()), which is safe and requires no extra configuration.

Use Path.resolve() on both the root and the user path.

Verify that resolved_user_path == allowed_root or allowed_root in resolved_user_path.parents. If not, log and raise a ValueError.

Update analyze_local_folder to call this helper instead of blindly wrapping path with Path(path). Pass the resulting safe Path into analyze_sources.

This ensures that any usage of analyze_local_folder (from tests/index.py or api/index.py) inherits the same validation without further changes to those files.

All changes are limited to api/analyzers/source_analyzer.py. We’ll need:

import os (a well-known standard lib) to read the environment variable.

A new private method SourceAnalyzer._resolve_and_validate_path(self, path: str) -> Path.

Adjusted code in analyze_local_folder to use that method and handle its result.

api/analyzers/javascript/analyzer.py

+            heritage = entity.node.child_by_field_name('body')
+            if heritage is None:
+                return
+            superclass_node = entity.node.child_by_field_name('name')


Update from >=0.23.2,<0.24.0 (no such version exists) to >=0.25.0,<0.26.0 to match the tree-sitter>=0.25.2 base dependency. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

api/analyzers/source_analyzer.py (1)
148-148: JavaScript second-pass symbol resolution is effectively disabled.

With NullLanguageServer, AbstractAnalyzer.resolve() will throw on lsp.request_definition(...) and silently return [] (caught by the base class's blanket except). This means resolve_type / resolve_method in JavaScriptAnalyzer will never resolve any symbols — no CALLS, EXTENDS, or PARAMETERS edges will be created during the second pass.

This is fine for an initial implementation, but consider adding a TODO or logging a warning so it's clear that JS symbol resolution is a known gap. When ready, integrating a real JS/TS language server (e.g., typescript-language-server) would enable full graph connectivity.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/source_analyzer.py` at line 148, The JavaScript language-server
entry currently uses lsps[".js"] = NullLanguageServer(), which causes
AbstractAnalyzer.resolve() to throw on lsp.request_definition(...) and prevents
JavaScriptAnalyzer.resolve_type/resolve_method from producing
CALLS/EXTENDS/PARAMETERS edges; update the code to mark this as a known
limitation by adding a TODO and/or logging a clear warning where lsps[".js"] is
set (or in NullLanguageServer.__init__), referencing lsps, NullLanguageServer,
AbstractAnalyzer.resolve, lsp.request_definition, and
JavaScriptAnalyzer.resolve_type/resolve_method so future work can replace
NullLanguageServer with a real JS/TS language server (e.g.,
typescript-language-server) to enable second-pass symbol resolution.
api/analyzers/javascript/analyzer.py (2)
40-45: Docstring extraction only captures immediate preceding single comments.

This won't capture JSDoc blocks that are separated from the declaration by whitespace/newlines (since they wouldn't be prev_sibling), nor will it capture multi-line comment blocks split across multiple comment nodes. For typical single-block JSDoc (/** ... */), this should work fine. Consider this a known limitation for the initial implementation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/javascript/analyzer.py` around lines 40 - 45,
get_entity_docstring currently only returns node.prev_sibling if it's a single
'comment', which misses JSDoc blocks separated by whitespace/newlines and
multi-line comment sequences; update get_entity_docstring to walk backwards from
the declaration node via prev_sibling, skipping over non-significant
whitespace/newline nodes and collecting contiguous 'comment' nodes (e.g.,
accumulate while prev_sibling.type == 'comment'), decode and join their .text
values into a single string, then return that combined docstring for the
function_declaration, class_declaration, and method_definition cases (refer to
the get_entity_docstring function and the node.prev_sibling traversal).
68-72: Parameter query only captures simple identifiers.

The tree-sitter query (formal_parameters (identifier) @parameter) will miss destructured parameters ({a, b}), default values (a = 1), and rest parameters (...args), which are common in modern JavaScript. This is acceptable for an initial implementation but worth noting as a known limitation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/javascript/analyzer.py` around lines 68 - 72, The current
Tree-sitter query only matches simple identifier parameters; update the query
passed to self.language.query in analyzer.py to also capture destructured,
default, and rest forms by including nodes like formal_parameters (identifier |
object_pattern | array_pattern | assignment_pattern | rest_pattern) `@parameter`
(or the equivalent node names used by the JavaScript grammar), then adjust the
captures handling (captures = query.captures(entity.node)) to iterate over
captures['parameter'] as before and call entity.add_symbol("parameters",
parameter) so destructured ({a,b}), default (a = 1), and rest (...args)
parameters are included along with simple identifiers.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/analyzers/javascript/analyzer.py`:
- Around line 50-61: In add_symbols (method add_symbols on Entity handling
class_declaration), remove the unused assignment to superclass_node and
eliminate or simplify the misleading heritage variable: either rename heritage
to body_node and use it for a clear defensive check or drop the body existence
guard entirely since class bodies are required; keep the loop that detects
child.type == 'class_heritage' and call entity.add_symbol("base_class",
heritage_child) for heritage_child.type == 'identifier' unchanged.

---

Duplicate comments:
In `@api/analyzers/javascript/analyzer.py`:
- Around line 1-13: The import statements in analyzer.py are already explicit
and clean, so no change is required; keep the explicit imports
(SyncLanguageServer, Entity, File, AbstractAnalyzer, tree_sitter_javascript as
tsjs, Language, Node, logging) as-is and ensure the logger variable logger =
logging.getLogger('code_graph') remains correctly named and used consistently
throughout this module.

---

Nitpick comments:
In `@api/analyzers/javascript/analyzer.py`:
- Around line 40-45: get_entity_docstring currently only returns
node.prev_sibling if it's a single 'comment', which misses JSDoc blocks
separated by whitespace/newlines and multi-line comment sequences; update
get_entity_docstring to walk backwards from the declaration node via
prev_sibling, skipping over non-significant whitespace/newline nodes and
collecting contiguous 'comment' nodes (e.g., accumulate while prev_sibling.type
== 'comment'), decode and join their .text values into a single string, then
return that combined docstring for the function_declaration, class_declaration,
and method_definition cases (refer to the get_entity_docstring function and the
node.prev_sibling traversal).
- Around line 68-72: The current Tree-sitter query only matches simple
identifier parameters; update the query passed to self.language.query in
analyzer.py to also capture destructured, default, and rest forms by including
nodes like formal_parameters (identifier | object_pattern | array_pattern |
assignment_pattern | rest_pattern) `@parameter` (or the equivalent node names used
by the JavaScript grammar), then adjust the captures handling (captures =
query.captures(entity.node)) to iterate over captures['parameter'] as before and
call entity.add_symbol("parameters", parameter) so destructured ({a,b}), default
(a = 1), and rest (...args) parameters are included along with simple
identifiers.

In `@api/analyzers/source_analyzer.py`:
- Line 148: The JavaScript language-server entry currently uses lsps[".js"] =
NullLanguageServer(), which causes AbstractAnalyzer.resolve() to throw on
lsp.request_definition(...) and prevents
JavaScriptAnalyzer.resolve_type/resolve_method from producing
CALLS/EXTENDS/PARAMETERS edges; update the code to mark this as a known
limitation by adding a TODO and/or logging a clear warning where lsps[".js"] is
set (or in NullLanguageServer.__init__), referencing lsps, NullLanguageServer,
AbstractAnalyzer.resolve, lsp.request_definition, and
JavaScriptAnalyzer.resolve_type/resolve_method so future work can replace
NullLanguageServer with a real JS/TS language server (e.g.,
typescript-language-server) to enable second-pass symbol resolution.

coderabbitai · 2026-02-21T07:31:35Z

api/analyzers/javascript/analyzer.py

+    def add_symbols(self, entity: Entity) -> None:
+        if entity.node.type == 'class_declaration':
+            heritage = entity.node.child_by_field_name('body')
+            if heritage is None:
+                return
+            superclass_node = entity.node.child_by_field_name('name')
+            # Check for `extends` clause via class_heritage
+            for child in entity.node.children:
+                if child.type == 'class_heritage':
+                    for heritage_child in child.children:
+                        if heritage_child.type == 'identifier':
+                            entity.add_symbol("base_class", heritage_child)


⚠️ Potential issue | 🟡 Minor

Dead code and misleading variable names in add_symbols for class declarations.

Two issues:

Line 55: superclass_node is assigned but never used (confirmed by Ruff F841). Remove it.

Line 52: Variable named heritage but it fetches the 'body' field — misleading. It's also only used for the None check on line 53, which is a defensive guard that's unlikely to trigger (classes without bodies are syntactically invalid JS).

🔧 Proposed fix

def add_symbols(self, entity: Entity) -> None: if entity.node.type == 'class_declaration': - heritage = entity.node.child_by_field_name('body') - if heritage is None: + body = entity.node.child_by_field_name('body') + if body is None: return - superclass_node = entity.node.child_by_field_name('name') # Check for `extends` clause via class_heritage for child in entity.node.children:

🧰 Tools

🪛 Ruff (0.15.1)

[error] 55-55: Local variable superclass_node is assigned to but never used

Remove assignment to unused variable superclass_node

(F841)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/analyzers/javascript/analyzer.py` around lines 50 - 61, In add_symbols (method add_symbols on Entity handling class_declaration), remove the unused assignment to superclass_node and eliminate or simplify the misleading heritage variable: either rename heritage to body_node and use it for a clear defensive check or drop the body existence guard entirely since class bodies are required; keep the loop that detects child.type == 'class_heritage' and call entity.add_symbol("base_class", heritage_child) for heritage_child.type == 'identifier' unchanged.

coderabbitai bot reviewed Jan 3, 2025

View reviewed changes

api/analyzers/javascript/analyzer.py Outdated Show resolved Hide resolved

vercel bot deployed to Preview January 3, 2025 12:57 View deployment

swilly22 requested changes Jan 4, 2025

View reviewed changes

gkorland marked this pull request as draft January 5, 2025 17:43

gkorland requested a review from Copilot January 9, 2025 13:56

Copilot AI reviewed Jan 9, 2025

View reviewed changes

vercel bot deployed to Preview February 21, 2026 07:12 View deployment

github-advanced-security bot found potential problems Feb 21, 2026

View reviewed changes

github-code-quality bot found potential problems Feb 21, 2026

View reviewed changes

api/analyzers/javascript/analyzer.py

heritage = entity.node.child_by_field_name('body')

if heritage is None:

return

superclass_node = entity.node.child_by_field_name('name')

gkorland marked this pull request as ready for review February 21, 2026 07:28

Fix tree-sitter-javascript version constraint

fbc512d

Update from >=0.23.2,<0.24.0 (no such version exists) to >=0.25.0,<0.26.0 to match the tree-sitter>=0.25.2 base dependency. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

vercel bot deployed to Preview February 21, 2026 07:30 View deployment

coderabbitai bot reviewed Feb 21, 2026

View reviewed changes

@@ -184,6 +184,24 @@
                     # Second pass analysis of the source code
                     self.second_pass(graph, files, path)
+                def _normalize_and_validate_path(self, path: str, must_be_dir: bool = True) -> Path:
+                    """
+                    Normalize and validate a user-supplied path before accessing the filesystem.
+                    The path is resolved to an absolute path to eliminate any '..' segments.
+                    If 'must_be_dir' is True, the path must exist and be a directory.
+                    Raises:
+                        ValueError: If the path is not valid for analysis.
+                    """
+                    base_path = Path(path)
+                    resolved = base_path.resolve()
+                    if must_be_dir and not resolved.is_dir():
+                        raise ValueError(f"Path '{resolved}' must be an existing directory")
+                    return resolved
                 def analyze_local_folder(self, path: str, g: Graph, ignore: Optional[list[str]] = []) -> None:
                     """
                     Analyze path.
@@ -195,8 +213,11 @@
                     logging.info(f"Analyzing local folder {path}")
+                    # Normalize and validate the supplied path before analysis
+                    resolved_path = self._normalize_and_validate_path(path, must_be_dir=True)
                     # Analyze source files
-                    self.analyze_sources(Path(path), ignore, g)
+                    self.analyze_sources(resolved_path, ignore, g)
                     logging.info("Done analyzing path")
@@ -213,10 +233,13 @@
                     """
                     from pygit2.repository import Repository
-                    self.analyze_local_folder(path, ignore)
+                    # Normalize and validate the repository path before accessing it
+                    resolved_path = self._normalize_and_validate_path(path, must_be_dir=True)
+                    self.analyze_local_folder(str(resolved_path), ignore)
                     # Save processed commit hash to the DB
-                    repo = Repository(path)
+                    repo = Repository(str(resolved_path))
                     head = repo.commit("HEAD")
                     self.graph.set_graph_commit(head.short_id)

@@ -36,6 +36,17 @@
             class SourceAnalyzer():
                 def __init__(self) -> None:
+                    # Define a root directory under which all analysis must occur.
+                    # This can be configured via the CODE_GRAPH_ROOT_DIR environment variable
+                    # and defaults to the current working directory.
+                    import os
+                    root_dir_env = os.environ.get("CODE_GRAPH_ROOT_DIR")
+                    if root_dir_env:
+                        self.root_dir = Path(root_dir_env).resolve()
+                    else:
+                        self.root_dir = Path.cwd().resolve()
+                    self.graph: Optional[Graph] = None
+                def __init__(self) -> None:
                     self.files: dict[Path, File] = {}
                 def supported_types(self) -> list[str]:
@@ -195,9 +206,23 @@
                     logging.info(f"Analyzing local folder {path}")
-                    # Analyze source files
-                    self.analyze_sources(Path(path), ignore, g)
+                    # Resolve and validate the requested path to ensure it is within the allowed root directory.
+                    requested_path = Path(path).resolve()
+                    if not requested_path.is_dir():
+                        logging.error(f"Path '{requested_path}' does not exist or is not a directory")
+                        raise ValueError(f"Invalid path: '{requested_path}' must be an existing directory")
+                    try:
+                        # Ensure the requested path is contained within the configured root directory.
+                        requested_path.relative_to(self.root_dir)
+                    except ValueError:
+                        logging.error(f"Path '{requested_path}' is outside of the allowed root directory '{self.root_dir}'")
+                        raise ValueError("Invalid path: directory is outside of the allowed root")
+                    # Analyze source files in the validated directory
+                    self.analyze_sources(requested_path, ignore, g)
                     logging.info("Done analyzing path")
                 def analyze_local_repository(self, path: str, ignore: Optional[list[str]] = None) -> Graph:

@@ -18,6 +18,7 @@
             from multilspy.multilspy_logger import MultilspyLogger
             import logging
+            import os
             # Configure logging
             logging.basicConfig(level=logging.DEBUG, format='%(filename)s - %(asctime)s - %(levelname)s - %(message)s')
@@ -184,6 +185,33 @@
                     # Second pass analysis of the source code
                     self.second_pass(graph, files, path)
+                def _resolve_and_validate_path(self, path: str) -> Path:
+                    """
+                    Resolve a user-provided path string against a safe base directory
+                    and ensure the result is an existing directory within that base.
+                    The base directory can be configured via the CODEGRAPH_BASE_DIR
+                    environment variable; if unset, the current working directory is used.
+                    """
+                    base_dir_env = os.environ.get("CODEGRAPH_BASE_DIR")
+                    base_dir = Path(base_dir_env) if base_dir_env else Path.cwd()
+                    base_dir = base_dir.resolve()
+                    # Join the user-provided path to the base directory and resolve it
+                    candidate = (base_dir / path).resolve()
+                    # Ensure the resolved path is a directory
+                    if not candidate.is_dir():
+                        raise ValueError(f"Path '{path}' is not a valid directory within the allowed base directory.")
+                    # Ensure the candidate path is within the base directory
+                    try:
+                        candidate.relative_to(base_dir)
+                    except ValueError:
+                        raise ValueError(f"Path '{path}' escapes the allowed base directory.")
+                    return candidate
                 def analyze_local_folder(self, path: str, g: Graph, ignore: Optional[list[str]] = []) -> None:
                     """
                     Analyze path.
@@ -195,8 +223,11 @@
                     logging.info(f"Analyzing local folder {path}")
+                    # Resolve and validate the user-provided path against a safe base directory
+                    safe_path = self._resolve_and_validate_path(path)
                     # Analyze source files
-                    self.analyze_sources(Path(path), ignore, g)
+                    self.analyze_sources(safe_path, ignore, g)
                     logging.info("Done analyzing path")

@@ -18,6 +18,7 @@
             from multilspy.multilspy_logger import MultilspyLogger
             import logging
+            import os
             # Configure logging
             logging.basicConfig(level=logging.DEBUG, format='%(filename)s - %(asctime)s - %(levelname)s - %(message)s')
@@ -184,6 +185,33 @@
                     # Second pass analysis of the source code
                     self.second_pass(graph, files, path)
+                def _resolve_and_validate_path(self, path: str) -> Path:
+                    """
+                    Resolve the given path and ensure it is contained within an allowed root.
+                    The allowed root is determined by the CODE_GRAPH_WORKSPACE_ROOT environment
+                    variable if set; otherwise, it defaults to the current working directory.
+                    """
+                    # Determine allowed root directory
+                    workspace_root = os.environ.get("CODE_GRAPH_WORKSPACE_ROOT")
+                    if workspace_root:
+                        allowed_root = Path(workspace_root).resolve()
+                    else:
+                        allowed_root = Path.cwd().resolve()
+                    resolved_path = Path(path).resolve()
+                    # Ensure the resolved path is within the allowed root
+                    if resolved_path != allowed_root and allowed_root not in resolved_path.parents:
+                        logging.error(
+                            "Rejected path '%s' as it is outside the allowed root '%s'",
+                            resolved_path,
+                            allowed_root,
+                        )
+                        raise ValueError(f"Path '{path}' is outside the allowed root.")
+                    return resolved_path
                 def analyze_local_folder(self, path: str, g: Graph, ignore: Optional[list[str]] = []) -> None:
                     """
                     Analyze path.
@@ -195,8 +223,11 @@
                     logging.info(f"Analyzing local folder {path}")
+                    # Resolve and validate the provided path before analyzing
+                    safe_path = self._resolve_and_validate_path(path)
                     # Analyze source files
-                    self.analyze_sources(Path(path), ignore, g)
+                    self.analyze_sources(safe_path, ignore, g)
                     logging.info("Done analyzing path")

	assert(caller_f is not None)
	if caller_f is None: logger.error(f'Caller function not found: {caller_name}'); continue

Conversation

gkorland commented Jan 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

swilly22 Jan 4, 2025

Choose a reason for hiding this comment

Uh oh!

swilly22 Jan 4, 2025

Choose a reason for hiding this comment

Uh oh!

swilly22 Jan 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Copilot Autofix

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gkorland commented Jan 3, 2025 •

edited by coderabbitai bot

Loading

vercel bot commented Jan 3, 2025 •

edited

Loading

coderabbitai bot commented Jan 3, 2025 •

edited

Loading